22 research outputs found
On Rational Delegations in Liquid Democracy
Liquid democracy is a proxy voting method where proxies are delegable. We
propose and study a game-theoretic model of liquid democracy to address the
following question: when is it rational for a voter to delegate her vote? We
study the existence of pure-strategy Nash equilibria in this model, and how
group accuracy is affected by them. We complement these theoretical results by
means of agent-based simulations to study the effects of delegations on group's
accuracy on variously structured social networks.Comment: 17 pages, 3 figures. This paper (without Appendix) appears in the
proceedings of AAAI'1
Determining Accessible Sidewalk Width by Extracting Obstacle Information from Point Clouds
Obstacles on the sidewalk often block the path, limiting passage and
resulting in frustration and wasted time, especially for citizens and visitors
who use assistive devices (wheelchairs, walkers, strollers, canes, etc). To
enable equal participation and use of the city, all citizens should be able to
perform and complete their daily activities in a similar amount of time and
effort. Therefore, we aim to offer accessibility information regarding
sidewalks, so that citizens can better plan their routes, and to help city
officials identify the location of bottlenecks and act on them. In this paper
we propose a novel pipeline to estimate obstacle-free sidewalk widths based on
3D point cloud data of the city of Amsterdam, as the first step to offer a more
complete set of information regarding sidewalk accessibility.Comment: 4 pages, 9 figures. Presented at the workshop on "The Future of Urban
Accessibility" at ACM ASSETS'22. Code for this paper is available at
https://github.com/Amsterdam-AI-Team/Urban_PointCloud_Sidewalk_Widt
Lenient Multi-Agent Deep Reinforcement Learning.
Much of the success of single agent deep reinforcement learning (DRL) in
recent years can be attributed to the use of experience replay memories (ERM),
which allow Deep Q-Networks (DQNs) to be trained efficiently through sampling
stored state transitions. However, care is required when using ERMs for
multi-agent deep reinforcement learning (MA-DRL), as stored transitions can
become outdated because agents update their policies in parallel [11]. In this
work we apply leniency [23] to MA-DRL. Lenient agents map state-action pairs to
decaying temperature values that control the amount of leniency applied towards
negative policy updates that are sampled from the ERM. This introduces optimism
in the value-function update, and has been shown to facilitate cooperation in
tabular fully-cooperative multi-agent reinforcement learning problems. We
evaluate our Lenient-DQN (LDQN) empirically against the related Hysteretic-DQN
(HDQN) algorithm [22] as well as a modified version we call scheduled-HDQN,
that uses average reward learning near terminal states. Evaluations take place
in extended variations of the Coordinated Multi-Agent Object Transportation
Problem (CMOTP) [8] which include fully-cooperative sub-tasks and stochastic
rewards. We find that LDQN agents are more likely to converge to the optimal
policy in a stochastic reward CMOTP compared to standard and scheduled-HDQN
agents.Comment: 9 pages, 6 figures, AAMAS2018 Conference Proceeding
Stability of Human-Inspired Agent Societies
Models of emotion, particularly those based on the Ortony, Clore, and Collins (OCC) account of emotions, have been used as part of agents' decision making processes to explore their effects on cooperation within social dilemmas [7, 19, 22]. We analyse two different interpretations of OCC agents. Firstly, Emotional agents that decide their action using only a model of emotions. To analyse the possibility of evolutionary stability of these agents we use the Prisoner's Dilemma game. We contrast the results with the second interpretation of an OCC agent, the Moody agent [7], which additionally uses a psychology-grounded model of mood. Our analysis highlights the different strategies that are needed to achieve success as a society in terms of both stability and cooperation, in the iterated Prisoner's Dilemma. The Emotional agents are better suited playing against a mixed group of agents with differing strategies than the Moody agents are. The Moody agents are more successful than the Emotional agents when only one strategy exists in the society
Robust Temporal Difference Learning for Critical Domains
We present a new Q-function operator for temporal difference (TD) learning
methods that explicitly encodes robustness against significant rare events
(SRE) in critical domains. The operator, which we call the -operator,
allows to learn a robust policy in a model-based fashion without actually
observing the SRE. We introduce single- and multi-agent robust TD methods using
the operator . We prove convergence of the operator to the optimal
robust Q-function with respect to the model using the theory of Generalized
Markov Decision Processes. In addition we prove convergence to the optimal
Q-function of the original MDP given that the probability of SREs vanishes.
Empirical evaluations demonstrate the superior performance of -based TD
methods both in the early learning phase as well as in the final converged
stage. In addition we show robustness of the proposed method to small model
errors, as well as its applicability in a multi-agent context.Comment: AAMAS 201